Search CORE

58 research outputs found

Fast Data in the Era of Big Data: Twitter's Real-Time Related Query Suggestion Architecture

Author: Dalton Jeff
Li Zhenghua
Lin Jimmy
Mishne Gilad
Sharma Aneesh
Publication venue
Publication date: 27/10/2012
Field of study

We present the architecture behind Twitter's real-time related query suggestion and spelling correction service. Although these tasks have received much attention in the web search literature, the Twitter context introduces a real-time "twist": after significant breaking news events, we aim to provide relevant results within minutes. This paper provides a case study illustrating the challenges of real-time data processing in the era of "big data". We tell the story of how our system was built twice: our first implementation was built on a typical Hadoop-based analytics stack, but was later replaced because it did not meet the latency requirements necessary to generate meaningful real-time results. The second implementation, which is the system deployed in production, is a custom in-memory processing engine specifically designed for the task. This experience taught us that the current typical usage of Hadoop as a "big data" platform, while great for experimentation, is not well suited to low-latency processing, and points the way to future work on data analytics platforms that can handle "big" as well as "fast" data

arXiv.org e-Print Archive

CiteSeerX

Semantic lexicon adaptation for use in query interpretation

Author: Ana-maria Popescu
Gilad Mishne
Patrick Pantel
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2010
Field of study

We describe improvements to the use of semantic lexicons by a state-of-the-art query interpretation system powering a major search engine. We successfully compute concept la-bel importance information for lexicon strings; lexicon aug-mentation with such information leads to a 6.4 % precision increase on affected queries with no query coverage loss. Fi-nally, lexicon filtering based on label importance leads to a 13 % precision increase, but at the expense of query cover-age

CiteSeerX

Crossref

Wikum: Bridging Discussion Forums and Wikis Using Recursive Summarization

Author: Ackerman Mark S
Cheng Justin
Cheng Justin
Ganesan Kavita
Luther Kurt
Mishne Gilad
Nenkova Ani
Shapiro Amy
Verroios Vasilis
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/02/2017
Field of study

Large-scale discussions between many participants abound on the internet today, on topics ranging from political arguments to group coordination. But as these discussions grow to tens of thousands of posts, they become ever more difficult for a reader to digest. In this article, we describe a workflow called recursive summarization, implemented in our Wikum prototype, that enables a large population of readers or editors to work in small doses to refine out the main points of the discussion. More than just a single summary, our workflow produces a summary tree that enables a reader to explore distinct subtopics at multiple levels of detail based on their interests. We describe lab evaluations showing that (i) Wikum can be used more effectively than a control to quickly construct a summary tree and (ii) the summary tree is more effective than the original discussion in helping readers identify and explore the main topics

DSpace@MIT

Crossref

Multiple ranking strategies for opinion retrieval in blogs

Author: Gilad Mishne
Publication venue
Publication date: 01/01/2006
Field of study

We describe our participation in the Opinion Retrieval task at TREC 2006. Our approach to identifying opinions in blog post consisted of scoring the posts separately on various aspects associated with an expression of opinion about a topic, including shallow sentiment analysis, spam detection, and link-based authority estimation. The separate approaches were combined into a single ranking, yielding significant improvement over a content-only baseline

CiteSeerX

International Migration, Integration and Social Cohesion online publications

Using blog properties to improve retrieval

Author: Gilad Mishne
Publication venue
Publication date: 01/01/2007
Field of study

This paper describes three simple heuristics which improve opinion retrieval effectiveness by using blog-specific properties. Blog timestamps are used to increase the retrieval scores of blog posts published near the time of a significant event related to a query; an inexpensive approach to comment amount estimation is used to identify the level of opinion expressed in a post; and query-specific weights are used to change the importance of spam filtering for different types of queries. Overall, these methods, combined with non-blogspecific retrieval approaches, result in substantial improvements over state-of-the-art

CiteSeerX

International Migration, Integration and Social Cohesion online publications

Miscellaneous General Terms Languages, Management

Author: Gilad Mishne
Publication venue
Publication date
Field of study

We describe a system for automating call-center analysis and monitoring. Our system integrates transcription of incoming calls with analysis of their content; for the analysis, we introduce a novel method of estimating the domain-specific importance of conversation fragments, based on divergence of corpus statistics. Combining this method with Information Retrieval approaches, we provide knowledge-mining tools both for the call-center agents and for administrators of the center

CiteSeerX